Scientific  and  Technical  Report 


Multivariate  Nonparametric  Statistical  Techniques 

for 

Simulation  Model  Validation 


Contract  No.  DAAL01-96-C-0063 
Data  Item  No.  A002 


Prepared  for: 

DEPARTMENT  OF  THE  ARMY 
U.S.  Army  Research  Laboratory 
Information  Science  and  Technology  Directorate 
Building  394,  Room  207 
Aberdeen  Proving  Ground,  MD  21005-5066 

Attn:  Ms.  Ann  E.  M.  Brodeen,  AMSRL-IS-CI 


Prepared  by: 

Edward  C.  Larson  and  B.  Eugene  Parker,  Jr.,  Ph.D. 

BARRON  ASSOCIATES,  INC. 

Jordan  Building 
1160  Pepsi  Place,  Suite  300 
Charlottesville,  VA  22901-0807 
Telephone:  804-973-1215 

and 

H.  Vincent  Poor,  Ph.D. 

PRINCETON  UNIVERSITY 
Department  of  Electrical  Engineering 
School  of  Engineering  and  Applied  Science 
Princeton,  NJ  08544-5263 
Telephone:  609-258-1816 


October  21,  1997 


Approved  test  gublic  teiecs as5 

DtexflaatqB 


19971023  022 


Foreword 


This  document  is  the  Final  Technical  Report  for  Contract  No.  DAAL01-96-C-0063,  an  SBIR 
Phase  I  Award  funded  by  the  U.S.  Army  for  the  nine-month  period  from  21  June  1996  to  20  March 
1997.  The  research  effort,  which  was  based  on  the  7  July  1995  DoD  SBIR  Program  Solicitation 
95.3  Topic  A95-060  entitled  “Simulation  Model  Validation,”  was  carried  out  by  Barron  Associates, 
Inc.  (BAI)  of  Charlottesville,  Virginia.  B.  Eugene  Parker,  Jr.,  Ph.D.  served  as  Principal  Investigator 
for  BAI.  Mr.  Edward  C.  Larson,  also  of  BAI,  conducted  most  of  the  data  analysis  documented 
herein,  and  was  a  primary  contributor  to  this  report. 

The  authors  express  their  appreciation  to  Malcolm  S.  Taylor,  Ph.D.  (now  retired),  the  Gov¬ 
ernment  Technical  Representative,  Ms.  Ann  E.M.  Brodeen,  and  Barry  A.  Bodt,  Ph.D.,  for  their 
helpful  guidance  and  encouragement  of  BAFs  efforts.  The  collaboration  of  Dr.  H.  Vincent  Poor  of 
Princeton  University,  who  served  as  a  consultant  in  this  project,  is  gratefully  acknowledged. 

This  report  is  published  in  the  interest  of  scientific  and  technical  exchange.  Publication  does  not 
constitute  approval  or  disapproval  of  the  ideas  or  findings  herein  by  the  United  States  Government. 

Distribution  authorized  to  U.S.  Government  agencies  only.  Proprietary  information  not  owned 
by  the  U.S.  Government  and  protected  by  a  contractor’s  “limited  rights”  statement  determined  on  7 
June  1996.  All  other  requests  shall  be  referred  to  the  acting  Government  Technical  Representative 
at  the  Army  Research  Laboratory,  Building  394,  Room  207,  Aberdeen  Proving  Grounds,  MD 
21005-5066,  ATTN:  Ms.  Ann  E.M.  Brodeen,  AMSRL-IS-CI. 


i 


11 


Contents 


Foreword  i 

Abstract  v 

1  Introduction  1 

2  Classical  Techniques  for  Model  Validation  2 

3  Moment-Based  Tests  for  Correspondence  3 

3.1  Univariate  Moment-based  Tests .  4 

3.2  Multivariate  Moment-based  Tests .  7 

3.3  Generalization  to  Multiple  Test  Classes  .  12 

4  Multivariate  Hybrid  Tests  12 

5  Partial  Multivariate  Rank  Transformations  13 

5.1  Hybrid  Test  Method  1 .  13 

5.2  Hybrid  Test  Method  2 .  14 

6  Complete  Multivariate  Rank  Transformations  15 

7  Multinomial  Tests  19 

7.1  Application  of  Multinomial  Test  to  Bivariate  Poisson  Process .  22 

8  Conclusions  23 

9  SBIR  Phase  II  Effort  24 


iii 


List  of  Figures 

1  Elliptical  Contours  of  Bivariate  Gaussian  Distribution .  9 

2  First  Steps  in  Computing  a  Rank  Transform  by  Cutting  Strips .  17 

3  Rank-transforms  of  and  T\ .  17 

4  Probability  of  Obtaining  n#  Heads  on  100  Coin  Flips .  20 

5  Simulated  Distribution  of  A  Values  for  Poisson  Process .  23 

List  of  Tables 

1  Mfc  and  Sk  Values  for  Gaussian  and  Uniform  Distribution  Models .  5 

2  Moments  of  Test  Sample  7^  .  5 

3  Moments  of  Test  Sample  Te .  6 

4  Moments  of  Bivariate  A f  (0,1)  Distribution .  10 

5  Moment-based  Tests  for  AR  Model  Driven  by  Gaussian  Noise  and  Uniform  Noise  .  .  11 

6  Hybrid  Tests  for  AR  Model  Driven  by  Gaussian  Noise  and  Uniform  Noise .  18 

7  Moments  of  Bivariate  A/*(0, 1)  Distribution .  18 

8  Bin  Counts  for  T\  and  %  21 

9  A  Statistics  for  T\  and  T2 .  22 


IV 


Final  Technical  Report 


Multivariate  Nonparametric  Statistical 
Techniques  for  Simulation  Model  Validation 


Abstract 

This  report  documents  the  findings  of  an  Army  SBIR  Phase  I  study  on  multivariate  non¬ 
parametric  tests  for  stochastic  model  validation.  We  herein  introduce  a  method  for  generalizing 
rank  transformations  to  the  multivariate  domain  such  that  the  rank-transformed  set  is  uniformly 
distributed  in  multiple  dimensions.  This  furnishes  a  more  robust  hypothesis  testing  technique 
than  earlier  proposed  approaches  and  has  certain  computational  advantages.  This  approach  is  well 
adapted  for  continuous-output  models.  For  tests  based  on  partitioning  the  model  output  space  into 
bins  and  computing  a  confidence  statistic  based  directly  on  bin  counts,  as  opposed  to  computing 
statistical  moments,  we  introduce  a  log-likelihood  statistic  that  appears  to  be  an  excellent  summary 
indicator  of  correspondence  between  a  simulation  model  and  test  data.  The  approach  is  extremely 
versatile  and  well-adapted  to  discrete-output  models. 
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1  Introduction 

The  present  Phase  I  SBIR  study  is  concerned  with  multivariate,  nonparametric  statistical  tech¬ 
niques  for  validating  stochastic  simulation  models.  Model  “validation”  means  ascertaining  whether 
or  not  a  computerized  or  analytic  model  of  some  phenomenon,  within  a  certain  domain  of  appli¬ 
cability,  is  in  sufficiently  accurate  agreement,  or  correspondence ,  with  reality,  as  represented  by  a 
finite  set  of  test  data  [22].  Although  there  are  a  number  of  differing  interpretations  on  the  meaning 
and  operational  nature  of  simulation  validation  (See,  e.g.,  [7,  9,  12,  17,  23,  24,  25]),  the  focus  of  this 
study  is  solely  on  statistical  approaches  based  on  testing  for  correspondence  between  a  finite  em¬ 
pirical  body  of  test  data  and  a  large  population  of  exemplars  generated  by  a  stochastic  simulation 
model. 

Computer-driven  simulation  of  stochastic  systems  is  a  fundamental  analytical  technique  used 
throughout  engineering,  and  the  natural,  physical,  and  social  sciences. ^  Thus,  the  development  of 
inference-based  techniques  for  model  validation  is  of  widespread  interest.  Consequently,  there  has 
been  considerable  attention  devoted  to  this  problem  over  the  past  three  decades.  (See  [8]  for  an 
extensive  annotated  bibliography  on  this  subject.)  Prior  to  this  effort,  however,  work  in  this  field, 
with  few  exceptions  (e.g.,  see  [5]),  has  addressed  largely  only  univariate  nonparametric  techniques 
or  multivariate  parametric  techniques.  The  extension  to  multivariate  nonparametric  techniques, 
which  is  discussed  herein,  is  nontrivial  [19]. 

Multivariate-output,  or  multiple-response,  simulation  models  [6]  are  of  interest  in  a  wide  vari¬ 
ety  of  applications,  such  as  in  assessing  the  performance  of  communications  networks.  Validating 
multivariate  models  is  much  more  complicated  than  validating  univariate  models  because  of  po¬ 
tential  dependencies,  or  correlations,  among  the  various  output  variables.  An  organized  statistical 
approach  to  the  problem  of  validating  stochastic  univariate  simulation  models  was  proposed  and 
explored  in  [20,  21].  Since  applying  univariate  techniques  separately  to  each  output  variable  fails  to 
detect  such  cross-couplings,  development  of  sound  multivariate  testing  procedures  requires  special 
thought  and  consideration.  As  is  described  in  detail  below,  we  have  generalized  the  model  valida¬ 
tion  approach  to  problems  of  multivariate  simulation.  Although  a  number  of  parametric  methods 
for  approaching  this  problem  have  been  proposed  previously  (e.g.,  [2]),  our  focus  is  on  tests  that  are 
nonparametric  (i.e.,  distribution- free),  in  order  to  obviate:  (1)  the  undesirable  necessity  of  assuming 
a  population  model  for  experimental  data;  or  (2)  the  requirement  of  ascribing  an  exact,  analytic 
statistical  model  to  account  for  experimental  data. 

The  basic  framework  in  which  model  validation  will  be  considered  in  this  study  is  that  of  multi¬ 
sample  statistical  inference.  In  particular,  we  will  consider  situations  in  which  one  or  more  empirical 
realizations  of  the  phenomenon  or  phenomena  to  be  modeled  are  available,  as  well  as  one  or  more 
realizations  produced  by  the  simulation  model.  Given  two  such  sets  of  data,  we  wish  to  determine 
whether  the  simulated  data  capture  accurately  the  behavior  of  the  empirical  data.  That  is,  we  wish 
to  perform  a  test  of  homogeneity. 

In  the  present  report,  we  identify  and  highlight  the  key  features  of  the  few  multivariate  tech¬ 
niques,  parametric  and  nonparametric,  that  have  been  proposed  in  the  literature,  and  demonstrate 
how  they  can  be  generalized  or  modified  to  yield  more  versatile  and  robust  test  methodologies. 

t  An  application  of  particular  interest  to  the  Sponsor  is  the  simulation  of  communication  networks  [10],  in  which 
the  analyst  wishes  to  determine  network  performance  -  e.g.,  in  terms  of  mean  throughput  and  delay  -  as  a  function 
of  variable  control  factors  -  e.g.,  message  length,  message  arrival  rate,  and  transmission  mode  (single  channel  or 
frequency  hopping). 
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As  demonstration  vehicles  for  these  advanced  methods,  we  present  a  suite  of  case  examples  that 
include  (1)  a  random  number  generator;  (2)  an  autoregressive  moving  average  (ARMA)  process; 
(3)  a  coin-flipping  exercise;  and  (4)  a  Poisson  process.  These  simple  examples  represent  the  essen¬ 
tial  features  for  a  considerable  range  of  problems,  often  for  which  computer  simulation  is  the  only 
practical  method  of  analyzing  system  behavior. 

2  Classical  Techniques  for  Model  Validation 

Naylor  and  Finger  [13]  have  listed  several  of  the  best-known  techniques  for  model  validation: 

•  Analysis  of  Variance  (ANOVA)  -  Tests  the  hypothesis  that  the  mean  (or  variance)  of  data 
generated  by  a  computer  simulation  matches  that  computed  from  an  empirical  test  sample. 
Three  important  assumptions  underscore  this  technique:  normality,  statistical  independence, 
and  a  common  variance. 

•  x2  Test  -  Tests  the  hypothesis  that  a  sample  of  simulated  exemplars  has  the  same  frequency 
distribution  as  the  test  sample.  This  test  is  relatively  sensitive  to  non-normality  and  requires 
selection  of  categories  for  data  that  are  suitable  and  unbiased. 

•  Kolmogorov-Smimov  Test  -  This  is  a  nonparametric  test  that  involves  comparing  the  cu¬ 
mulative  frequency  distribution  of  the  simulated  and  true  process  data.  It  treats  individual 
observations  separately  and,  unlike  the  x2  test,  does  not  involving  partitioning  the  data  into 
bins.  However,  it  employs  the  normality  assumptions  inherent  in  x2  testing  and  does  not 
generalize  readily  to  the  multivariate  domain. 

•  Regression  Analysis  -  This  test  involves  regressing  the  true  process  data  on  the  simulated 
data  and  testing  whether  the  resulting  regression  equations  have  intercepts  that  are  not 
significantly  different  from  zero,  and  slopes  that  are  not  significantly  different  from  unity. 

•  Spectral  Analysis  -  This  involves  computing  second-  or  higher-order  spectra  (polyspectra)  of 
a  time  series  and  comparing  the  estimates  for  the  simulated  and  test  data. 

•  Other  Techniques  -  There  are  a  host  of  additional  potential  model  validation  techniques, 
some  of  which  have  already  been  investigated  by  others,  such  as  factor  analysis  and  TheiVs 
inequality  coefficient  Other  examples  include  comparison  of  data  compression  properties, 
comparison  of  data  state-space  reconstructions  (e.g.,  via  Taken’s  embedding  theorem)  [26], 
and  comparison  of  parametric  or  nonparametric  (e.g.,  neural  network)  model  predictions, 
structures,  or  parameter  values. 

In  summary,  the  important  distinguishing  characteristics  of  various  alternative  test  procedures 
are  the  validity  of  the  assumptions  implicit  in  them,  their  sensitivity  to  violations  of  those  assump¬ 
tions,  and  their  flexibility  in  applying  to  test  data  other  than  those  represented  by  just  one  empirical 
test  database  (e.g.,  extrapolation)  [24].  Most  of  the  above  techniques  are  either  parametric  or  uni¬ 
variate  in  scope  and  are  limited  in  their  domain  of  applicability  and  the  restrictive  assumptions  that 
they  employ.  Our  emphasis  herein  is  on  development  and  testing  of  multivariate  nonparametric 
techniques,  which  offer  wide  applicability  with  regard  to  such  assumptions  and  extensions  for  test¬ 
ing  of  stochastic  models.  Nonparametric  procedures  for  simulation  model  validation  do  not  rely  on 
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assumptions  regarding  the  statistical  distributions  of  data.  As  such,  they  are  particularly  valuable 
in  establishing  the  authenticity  of  simulations  in  which  high  confidence  cannot  be  placed  in  knowl¬ 
edge  of  the  true  distribution  of  empirical  data,  as  is  generally  the  case  with  real-world  data.  The 
use  of  multivariate  testing  obviates  the  need  to  perform  multiple  univariate  tests,  and  avoids  the 
pitfall  of  increased  probability  of  significant  findings  due  to  chance  alone.  The  use  of  multivariate 
statistics  also  increases  the  power  of  a  test,  as  univariate  tests  disregard  the  covariances  among  the 
variables  and  hence  use  less  of  the  information  available  about  a  set  of  observations. 


3  Moment-Based  Tests  for  Correspondence 

In  this  section,  we  elaborate  on  a  basic  theme,  namely  the  computation  of  moments,  common 
to  all  of  the  tests  introduced  in  the  literature  ([5,  15,  20,  21])  to  date.  This  technique  is  employed  in 
both  parametric  and  nonparametric  tests.  In  this  and  all  of  the  subsequent  sections,  our  objective, 
stated  formally,  is  to  ascertain  whether  a  proposed  stochastic  model,  M,  accounts  for  an  empirical 
test  sample,  T,  of  real-world  data.  The  fundamental  strategic  approach  of  all  moment-based  tests 
is  to  compare  certain  sample  statistics  of  T  (i.e.,  sample  moments)  with  the  values  that  they 
axe  expected  to  assume  under  the  null  hypothesis  that  M  accounts  for  the  test  data  in  T  or, 
alternatively  stated,  that  M.  and  T  are  in  correspondence  with  one  another,  viz.,  M.  * — *  T,  where 
‘D’  means  “as  a  distribution.” 

Since  T  is  only  a  finite  sample  (size  N),  there  will  generally  be  some  discrepancy  between  any 
sample  statistic  (say,  for  concreteness,  the  sample  mean)  computed  from  T  and  the  expected  value 
of  that  statistic  determined  from  theoretical  or  simulation  analysis  of  the  surmised  stochastic  model, 
M. .  To  see  why  this  is  so,  one  could  go  about  determining  the  expected  value  of  the  sample  mean 
for  T  by  generating  a  large  number  of  simulation  samples,  all  of  the  same  size  as  T,  and  creating  a 
scatter  plot  of  the  resulting  sample  means.  Even  in  the  limit  of  infinitely  many  simulation  sample 
sets,  the  distribution  of  sample  means  will  have  finite  variance.  According  to  the  Central  Limit 
Theorem  (CLT),  the  distribution  of  sample  means  will  be  approximately  Gaussian  with  mean  \i 
and  standard  deviation  a/VN,  where  /i  and  a  are  respectively  the  population  mean  and  standard 
deviation  of  the  output  distribution  characterizing  M.  It  follows  that  a  discrepancy  between  the 
computed  sample  mean  of  T  and  the  expected  value,  n,  is  “acceptable”  if  it  is  on  the  order  of 
cr/VN}  The  essence  of  moment-based  tests,  in  short,  is  to  determine  whether  the  discrepancies 
between  sample  statistics  computed  from  T  and  their  expected  values  are  acceptable  vis-a-vis  the 
standard  error  of  the  sample  statistics.  Unacceptably  large  discrepancies  are  grounds  for  rejecting 
M. . 

Justification  for  rejecting  the  null  hypothesis,  M  T,  is  deemed  either  appropriate  or 
inappropriate  based  on  the  scope  of  the  validation  tests  that  have  been  applied.  Although  the 
failure  of  a  model  to  pass  a  certain  battery  of  tests  may  permit  its  definitive  rejection  altogether,  it 
is  never  possible,  strictly  speaking,  to  proclaim  a  model  valid  based  on  the  results  of  a  finite  number 
of  statistical  tests,  no  matter  how  extensive.  One  can  say  only  that  a  model  has  survived  scrutiny 
or  that  it  has  not.  With  an  infinitely  large  test  sample  it  would  be  possible  to  make  definitive 
pronouncements  in  the  affirmative  as  well  as  the  negative,  but  that  is  not  the  case  in  practical 
model  validation  applications. 

In  all  of  the  stochastic  model  simulation  scenarios  considered  in  this  report,  it  is  assumed  that 

*Of  course,  not  all  tests  necessarily  use  this  type  of  critical  region. 
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simulation  exemplars  can  be  generated  in  arbitrarily  large  numbers  and  that  sufficient  randomness 
and  independence  among  the  generated  exemplars  can  be  achieved  (e.g.,  by  using  a  freshly  seeded 
random  number  generator  for  each  run).  As  a  result,  a  good  “picture”  of  the  output  probability 
distribution  function  (p.d.f.)  describing  M.  can  be  obtained.  Empirical  data,  by  contrast,  are 
generally  scarce. 


3.1  Univariate  Moment-based  Tests 

To  elucidate  the  main  steps  of  applying  moment-based  tests,  we  focus  on  the  simple  case  of  a 
stochastic  model,  M.  whose  output  is  a  univariate  Gaussian  distribution  of  population  mean  pi  and 
standard  deviation  a.  viz. 

P(  x)  =  -L=e-<*-<0!A’!  (1) 

g\J2'k 

where  P(x)  is  the  probability  density  characterizing  the  output  of  M.  We  shall  denote  such 
a  Gaussian  distribution  as  and  write  M  =  A f(facr)  to  express  the  statement  that  the 

output  of  M  is  a  a)  distribution.  Suppose,  for  instance,  that  we  are  attempting  to  build  a 
random  number  generator  that  nominally  returns  normally  distributed  numbers  and  that  we  wish 
to  find  out  whether  it  is  a  “good”  source  of  random  numbers. 

In  the  special  case  of  Gaussian  models,  it  is  convenient  to  apply  the  following  linear  transfor¬ 
mation  to  T,  viz. 

Zi  =  {xi-  pi) /a.  (2) 

This  way,  comparing  T  to  N(pi,  cr)  is  equivalent  to  comparing  Tz  =  {z\, . . . ,  z^}  to  the  canonical 
form  JV(0, 1),  which  has  zero  mean  and  unit  variance.  The  sample  statistics  that  are  most  frequently 
computed  in  parametric  tests  are  the  sample  moments,  viz., 


1  N 

mk  =  J^T  zi  fc=  1,2,3,... 


(3) 


If  N  is  not  too  small,  we  may  appeal  to  the  CLT  to  argue  that  the  probability  distributions 
characterizing  each  rrik  are  asymptotically  Gaussian  with  mean  and  standard  deviation  Sk 
given  by 


Mk  =  /xfc  (4a) 

Sk  =  (Tk/VN  (4b) 


where 


=  f  xkP(x)  dx 
J — oo 


<?k  = 


-oo 

roc 


-  1/2 

n 

B2k  ~  Hk 

1 1/2 


(5a) 

(5b) 


are  the  population  moments.  Note  that  [i\  =  /x  and  a\  =  a.  The  results  for  the  first  few  moments, 
in  the  canonical  case  of  fj,  =  0  and  a  =  1,  are  provided  in  Table  1.  The  general  results  in  Eqs.  4  and 
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5  do  not  depend  on  the  specific  functional  form  of  P{x).  Thus,  they  are  not  restricted  to  the  special 
case  of  Gaussian  models.  Results  for  the  case  of  a  uniform  distribution  over  the  unit  interval,  viz. 


f  1  if  0  <  z  <  1 
|  0  otherwise 


(6) 


are  also  provided  in  Table  1. 


Table  1:  Mk  and  Sk  Values  for  Gaussian  (left)  and  Uniform  (right) 
Distribution  Models 


k 

Mk 

sk 

k 

Mk 

sk 

1 

0 

7W 

1 

1/2 

WJWn 

2 

1 

VW 

2 

1/3 

V(4/45  )/N 

3 

0 

\Jlb/N 

3 

1/4 

y/(9/112)/N 

4 

3 

V96  /N 

4 

1/5 

V(16/225  )/N 

5 

0 

^945 JN 

5 

1/6 

V(25/396)/yV 

6 

15 

^10, 170/iV 

6 

1/7 

-y/  (36/637)  /TV 

The  inspiration  for  computing  higher-order  moments  is  accredited  to  Reynolds  ([20,  21]),  whose 
Uk  and  U£  statistics  are  essentially  the  same  as  rrik  —  M k  and  (m^  —  Mk)/Sk  respectively  for  the 
special  case  of  one  test  class  (see  remarks  below).  As  a  numerical  demonstration  of  a  univariate 
moment-based  test,  we  generated  a  test  sample,  71,  of  N  =  100,000  exemplars  using  a  MATLAB 
routine  that  nominally  returned  a  sequence  of  numbers  representing  an  Af(0, 1)  distribution.  Com¬ 
puted  sample  moments  are  presented  in  Table  2.  The  fourth  column  gives  the  confidence  statistic . 


Table  2:  Moments  of  Test  Sample  T\ 


k 

mk 

Zk 

1 

1.23  x  10~5 

0.0039 

0.5016 

2 

1.0022 

0.4919 

0.6886 

3 

-0.0086 

-0.7037 

0.2408 

4 

3.0120 

0.3877 

0.6509 

5 

-0.0544 

-0.5600 

0.2877 

6 

14.9884 

-0.0364 

0.4855 

a/c,  for  the  computed  value  of  m k  vis-a-vis  the  probability  distribution,  Af(Mkj  5/c),  associated  with 
it.  ak  is  defined  as  the  probability  that  the  fc’th-order  sample  moment  of  an  arbitrary  simulation 
sample,  <S,  of  size  N  generated  by  M  will  be  less  than  m^.  Since  the  probability  distribution  for 
mk  is  asymptotically  Gaussian  with  mean  M k  and  standard  deviation  S^,  it  follows  that  a k  is  equal 
to  Erf  (zfc),  where  Zk  =  (mjb  -  Mk)/Sk  is  the  ^-transform  of  mk  vis-a-vis  Sk)  and 


Erf  (:r) 


rx2/2  dz 


(7) 
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is  the  definite  integral  of  the  normalized  zero-mean,  unit-variance  Gaussian  distribution.  The 
value  03  =  0.2408,  for  example,  indicates  that  were  other  simulation  samples  of  size  N  =  100,  000 
generated  by  a  Af( 0, 1)  random  number  generator,  an  m3  value  less  than  —0.0086  would  be  obtained 
approximately  24%  of  the  time.  Extreme  values  of  a k  (e.g.,  a k  <  0.01  or  a*  >  0.99)  cast  doubt 
on  the  validity  of  M  as  a  candidate  model  to  account  for  T\.  The  occurrence  of  two  or  more 
extreme  values,  as  a  rule,  often  justifies  rejection  of  M.  The  actual  decision,  however,  of  whether 
to  reject  M  depends  on  the  outcome  of  a  decision  algorithm ,  which  typically  incorporates  specific 
threshold  settings  on  the  a^s  and  examines  their  values  in  toto.  The  formulation  of  such  decision 
rules  for  practical  applications  is  generally  a  complex  problem  and  hinges  on  such  considerations 
as  the  relative  penalties  ascribed  to  different  types  of  erroneous  decisions  that  may  occur. 

The  confidence  statistics  in  Table  2  all  have  nonextreme  values.  There  is  accordingly,  based 
on  the  scope  of  the  moment-based  tests  applied  thus  far,  no  justification  for  rejecting  the  null 
hypothesis  that  M  ^  -A/*(0, 1)  is  a  valid  model  that  accounts  for  T\.  As  an  example  of  what  happens 
when  the  univariate  moment-based  tests  are  applied  to  non-Gaussian  test  data,  we  generated  a  t- 
distribution  with  v  =  9  degrees  of  freedom  and  proceed  to  show  that  it  is  non-Gaussian.  Such  a 
i-distribution  is  obtained  by  extracting,  from  a  a)  population,  a  sample  of  size  v  +  1  =  10. 
The  ^-statistic,  defined  as 


m  —  jj, 
s/VN 


(8) 


is  computed  for  each  sample,  where  m  and  s  are  the  sample  mean  and  standard  deviation  respec¬ 
tively.  The  resulting  set  of  t-statistics  converges  asymptotically  to  a  i-distribution  with  v  degrees 
of  freedom.  An  expression  for  the  exact  functional  form  of  the  p.d.f.  may  be  derived  analytically. 
viz. 

1  r((i/  +  i)/2) 


T(t)  = 


V7 r 


rW2) 


~("+l)/2 


(9) 


The  reason  for  selecting  a  t-distribution  for  demonstration  was  that  it  “looks  like”  a  Gaussian 
distribution  in  certain  superficial  respects:  it  has  zero  mean,  finite  variance,  symmetry  about  zero, 
and  nonzero  probability  density  at  all  real  values  of  t .  It  is  thus  a  nontrivial  problem  to  ascertain 
whether  a  simulation  sample  drawn  from  a  t-distribution  fails  to  represent  a  J\f( 0,  a)  distribution, 
where  a  =  \Jvj{y  —  2)  is  the  analytically  computed  standard  deviation  of  the  ^-distribution.  To 
create  a  test  sample,  7^  we  generated  a  ^-distribution  [y  =  9)  of  N  =  10, 000  exemplars  and  applied 
the  moment-based  tests  under  the  null  hypothesis  JV( 0,  y/9/7)  T2 .  The  results  analogous  to 
those  in  Table  2  are  provided  in  Table  3,  in  which  m *  ~  1  [N  J2iLi  (U/\/9j7)k- 


Table  3:  Moments  of  Test  Sample  T2 


a 

mk 

Zfc 

Oik 

1 

0.0092 

0.9207 

0.8214 

2 

0.9860 

-0.9923 

0.1605 

.3 

-0.0246 

-0.6344 

0.2629 

4 

3.8386 

8.5589 

1 

5 

-1.1908 

-3.8736 

5.36  x  10'5 

6 

34.5354 

19.3714 

1 

SBIR  Topic  A95-060 


6 


Barron  Associates ,  Inc . 


Final  Technical  Report 


Multivariate  Nonparametric  Statistical 
Techniques  for  Simulation  Model  Validation 


It  is  evident  from  the  results  in  Table  3  that  the  failure  of  the  higher-order  moments  to  agree 
with  the  Mfc’s  under  the  assumption  of  Gaussianity  (on  the  left  in  Table  1)  is  grounds  for  rejecting 
the  null  hypothesis  A/O,  a/9/7)  T2.  The  numbers  in  Table  3,  in  toto ,  reveal  convincingly 

the  non-Gaussian  character  of  the  ^-distribution  test  sample,  7*2 .  Prom  this  simple  comparison 
demonstration,  it  appears  that  moment-based  tests  provide  a  powerful  means  of  model  validation. 


3.2  Multivariate  Moment-based  Tests 

We  show  herein  how  the  moment-based  tests  generalize  readily  to  the  multivariate  realm.  For 
concreteness,  we  focus  on  the  case  of  the  output  distribution  emerging  from  a  linear  bivariate  AR(1) 
process,  viz., 

( yi* )  =  ( Au  Ai’2 )  ( y i-*-1 )  +  ( ei-fc )  do) 

V  V2,k  J  {  A2, 1  A2)2  J  ^  2/2,*— 1  J  +  ^  62, fc  J  [  ] 

in  which  k  is  an  index  denoting  the  (discretized)  time,  y\  and  y2  are  a  coupled  pair  of  time- 
series  outputs,  ei  and  e2  are  a  pair  of  independent  noise  channels,  and  A  is  a  matrix  of  constant 
coefficients.  For  concreteness,  we  used  the  following  particular  set  of  A  values: 


/  0.2  0.1  \ 

(  -0.2  0.6  J  ' 


(11) 


If  the  noise  processes  are  both  V( 0,  l)-distributed,  the  second-order  static  (zero-delay)  moments 
can  be  shown  to  compute  to 


yi,kV2,k 


V  y\k  / 


\ 

*1,2 

ah 


1.0609  \ 
0.0599 
1.6063  j 


(12) 


Since  the  AR  process  is  a  linear  filtering  of  Gaussian  noise  inputs,  it  can  be  shown  that  the 
output  vector,  (yi,k  2/2,*)T  >  interpreted  as  a  static  distribution,  represents  a  bivariate  Gaussian 
distribution  with  population  mean 


M  = 


(13a) 


and  variance 


*1,1  *1,2  \ 
°2,i  ah  ) 


(13b) 


the  component  values  for  which  are  given  in  Eq.  12.  Note  that  in  the  multivariate  realm,  the 
population  mean  of  a  P-dimensional  probability  distribution  is  a  P  x  1  vector  and  the  covariance 
is  a  P  x  P  tensor. 

Given  a  simulation  sample,  <S,  of  N  —  100, 000  time-series  exemplars  generated  numerically  by 
the  AR  process,  we  wish  to  test  the  null  hypothesis  N{p,cr)  S.  Since  the  covariance  matrix  for 
an  arbitrary  zero-mean  probability  distribution  of  dimensionality,  P,  is  always  Hermitean  (positive 
definite  and  symmetric),  it  follows  that  <r2  may  be  diagonalized  with  respect  to  a  rotated  set  of 
orthogonal  coordinate  axes  (the  principal  axes  of  the  distribution)  in  the  P-dimensional  state  space. 
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If  one  effects  a  linear  transformation  of  the  state  coordinates,  y\  and  y2>  from  the  original  to  the 
rotated  frame,  the  resulting  probability  density  function  assumes  the  decoupled  form 


P{y[,y2,---,y'p) 


1  c-(y\)V  2S?‘ 

1  c-{yW/2-zi 

1  r-(y'P)2 /2Z2P 

_Ei\/27 r 

.E2\/27r 

.Sp\/27T 

(14) 


where  Ei, . . . ,  Ep  are  the  square  roots  of  the  eigenvalues 
nates,  viz., 


y[ 

y’P 


of  a2,  and  y'1,...,y'p  are  rotated  coordi- 


\ 

/ 


(15) 


where  A  is  a  proper  rotation  matrix  such  that  AAT  =  1  and  |A|  =  1.  In  the  bivariate  case,  the 
rotation  matrix  assumes  the  form 


(  cos  9  sin  9 
l  —  sin  9  cos  9 


(16) 


in  which  6  is  the  angle  of  rotation  between  the  unprimed  and  primed  coordinate  systems. 

Once  in  the  rotated  frame,  P  separate  ^-distributions  may  be  obtained  by  rescaling  the  coordi¬ 
nate  axes  to  normalize  the  variances,  viz., 


(  yi/Si 


V  y'p/Xp 


or,  in  matrix  form 

l'  =  . 

where  a  is  the  covariance  tensor  that  appears  as 


/Si  0  0  \ 

0  •.  0 
0  0  E  p  j 


(17) 


(18) 


(19) 


in  the  primed  system. 

To  assess  the  efficacy  of  the  multivariate  moment-based  tests  for  Gaussian  models,  we  generated 
a  test  sample,  T,  of  N  —  100, 000  exemplars  from  the  AR  model  in  Eq.  10.  The  noise  processes 
were  a  pair  of  independent  Af( 0, 1)  processes.  Hence,  we  expect  the  model  output,  viewed  as  a 
static  distribution,  to  be  a  zero-mean  bivariate  Gaussian  process  with  covariance 


/  1.0609  0.0599  A 
0.0599  1.6063  ) 


(20) 


as  in  Eqs.  12  and  13b.  We  wish  to  test  the  null  hypothesis  A/"( 0,  a)  T.  To  apply  the  moment- 
based  tests  for  Gaussianity,  we  first  transform  to  the  rotated  coordinate  frame  in  which  a2  is 
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diagonalized.  The  transformation  to  the  frame  in  which  the  variance  is  diagonalized  is  represented 
by  the  rotation  matrix  in  Eq.  15,  such  that 

(1  e|)=v2a_1  (21> 

where  £1  =  1.0544  and  £2  —  1-6128  are  the  square  roots  of  the  eigenvalues  of  a2.  In  this  case,  the 
rotation  angle  computes  to  6  —  —6.20°.  The  essence  of  the  rotation  is  illustrated  in  Fig.  1,  which 
shows  the  principal  axes  as  dashed  lines.  The  principal  axes  are  found  by  rotating  the  original 
coordinate  axes  in  the  two-dimensional  state  space  through  an  angle  9 .  For  more  complicated 
multivariate  problems  with  three  or  more  variables,  P  —  1  angular  variables  are  required  to  specify 
the  orientation  of  the  principal  axes.  The  contours  of  constant  probability  density  for  any  bivariate 
Gaussian  distribution  are  a  family  of  concentric  ellipses  whose  principal  axes  are  those  corresponding 
to  the  coordinate  frame  in  which  a2  is  diagonalized. 


Figure  Is  Elliptical  Contours  of  Bivariate  Gaussian  Distribution 

Having  effected  the  transformation  to  the  rotated  frame,  we  apply  the  component-wise  z- 
transform  of  Eq.  18  to  obtain  a  transformed  test  sample  set,  which  we  shall  denote  as  T£.  The 
hypothesis  jV(0,  I)  is  equivalent  to  J\f( 0,  a)  ►  T,  where  I  is  the  P  x  P  identity  matrix 

and  A/^O,  I)  denotes  a  set  of  P  independent  zero-mean,  unit- variance  Gaussian  distributions.  The 
left-hand  column  of  Table  5  provides  confidence  statistics  for  the  various  sample  moments, 
of  7^,  which  are  analogous  to  those  introduced  in  the  univariate  section.  For  each  sample  moment, 
a  2-statistic,  viz., 


and  a  confidence  statistic 

zkxM  -  (mfci,fc2  -  Mk^kz) / Sk^kl 

(22) 

are  computed,  where 

aki,kz  =  Erf  (zkl,k2) 

(23) 

Mkuk2 

=  Hkite  =  [  xkllx\2P{x\,x2)dxidx2 

J  —  CO 

(24a) 
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and 

skiM  =  crfci,fc2/v/ N  =  iV_1/2  [/i2fci,2fe2  -  Mfci,**]  (24b) 

are  the  expected  values  and  standard  errors,  respectively,  of  the  sample  moments.  They  are  anal¬ 
ogous  to  the  corresponding  statistics  introduced  in  the  univariate  section  and  are  related  to  the 
moments  of  the  probability  distribution  function.  The  analytically  computed  results  for  the  first 
few  moments  are  provided  in  Table  4. 


Table  4:  Moments  of  Bivariate  A/^O, 1)  Distribution 


ki 

VkiM 

&k\  ,&2 

1 

0 

0 

1 

2 

0 

1 

1 

1 

0 

i 

3 

0 

0 

n/1'5 

2 

1 

0 

v/3 

4 

0 

3 

796 

3 

1 

0 

715 

2 

2 

1 

78 

5 

0 

0 

7945 

4 

1 

0 

7105 

3 

2 

0 

745 

6 

0 

15 

710, 170 

5 

1 

0 

7^45 

4 

2 

3 

7306 

3 

3 

0 

15 

Using  Eqs.  22  and  23.  we  computed  the  moment-based  test  results  for  the  hypothesis  A/"(0, 1) 

77.  The  results  are  displayed  in  the  left  half  of  Table  5. 

The  numbers  in  the  left  part  of  Table  5  all  have  good  confidence  statistics  with  one  or  two 
possible  exceptions.  Based  on  the  scope  of  the  moment-based  tests,  we  cannot  justifiably  reject  the 
null  hypothesis  .7(0, 1)  7^,  or  equivalently,  7(0,  a)  77  T.  The  results  thus  far  uphold  the 

conjecture  that  the  output  of  the  AR  process,  when  driven  by  Gaussian  noise  processes,  is  itself 
Gaussian. 

As  an  example  of  a  test  sample  that  does  not  represent  a  Gaussian  distribution,  we  simulated 
the  same  AR  model  except  that  the  noise  channels  were  non-Gaussian.  The  channels  were  still 
independent,  but  their  outputs  instead  represented  a  uniform  distribution  over  the  interval  (—1,1). 
Note  that  the  probability  density  for  such  noise  sources  is 


where 


P(x )  =  \ 0(x  +  1)  —  \Q(x  —  1) 


6(x)  = 


1 

0 

undefined 


if  x  >  0 
if  x  <  0 
if  x  =  0 


(25) 

(26) 


SBIR  Topic  A95-060 


10 


Barron  Associates,  Inc . 


Final  Technical  Report 


Multivariate  Nonparametric  Statistical 
Techniques  for  Simulation  Model  Validation 


is  the  Heaviside  step  function. 


Table  5:  Moment-based  Tests  for  AR  Model  Driven  by  Gaussian 
Noise  (left)  and  Uniform  Noise  (right) 


Kyi 

~k^\ 

mfcl,fc2 

zk\  ,k2 

&ki  ,fc2 

i 

0.0013 

0.396 

■ 

0 

0.0026 

2 

0 

0.9991 

0.4247 

1 

1 

-0.0001 

-0.023 

0.4908 

0 

H 

0.9880 

-2.673 

0.0038 

3 

O 

0.5960 

2 

1 

1 

0.6558 

i 

2 

0.0089 

1.618 

0.9472 

0 

3 

-0.0028 

-0.226 

0.4106 

4 

0 

0 

3 

1 

-0.0852 

-6.959 

0 

2 

2 

0.9659 

-3.810 

6.9483  x  10"5 

1 

3 

0.0530 

4.329 

1 

0 

4 

2.4365 

-18.187 

0 

h 

'W'ki,k2 

Zk\  ,^2 

otkiM 

n 

-1.272 

0.1017 

BmIIm 

1.161 

0.8772 

u 

0.1017 

H 

H 

m 

0.8800 

2 

-1.811 

0.0351 

3 

mm 

0.4021 

2 

1 

0.0052 

0.946 

0.8284 

1 

2 

-0.0020 

-0.358 

0.3602 

ra 

3 

-0.0024 

-0.198 

0.4215 

4 

0 

3.0423 

0.9136 

3 

1 

0.0046 

0.376 

0.6465 

2 

2 

0.9874 

-1.410 

0.0793 

1 

3 

0.0087 

0.710 

0.7611 

0 

4 

2.9346 

-2.111 

0.0174 

Equivalently,  the  noise  channel  outputs,  e\  and  62,  are  described  by  the  joint  probability  density 
function 


P(yuy2) 


1/4  if  — 1  <  2/1  <  1  and  —  1  <  yi  <  1 
0  if  0  otherwise. 


(27) 


The  variance  of  the  uniform  zero-centered  distribution  is 

e1  =  -  f  x2dx  = 

2  y_i  3 


(28) 


It  is  therefore  fair  to  ask  whether  the  output  of  the  resulting  AR  process  is  distinguishable  from 
that  of  a  bivariate  Gaussian  process  where  the  variances  of  the  noise  channels  are  both  1/3.  With 
the  assumption  of  such  underlying  noise  processes,  we  generated  another  test  sample  of  size  N  = 
100, 000  and  transformed  the  AR  outputs,  y,  to  the  canonical  form,  z',  using  the  exact  same  steps 
as  above,  except  that  the  elements  of  a2  and  the  eigenvalues  were  all  scaled  down  by  a  factor  of  3. 
Results  are  tabulated  in  the  right  half  of  Table  5. 

The  confidence  statistics  for  this  case  are  considerably  poorer,  especially  for  the  fourth-order 
moments.  Some  of  the  z  and  a  statistics  are  clearly  out  of  the  mainstream.  This  provides  convincing 
evidence  that  the  output  of  the  AR  model  driven  by  uniform-distribution  noise  is  non-Gaussian, 
i.e.,  the  hypothesis  that  the  test  sample  corresponds  to  N(0:cr/V3)-  In  conclusion,  this  shows  that 
moment-based  tests  for  Gaussianity  extend  readily  to  the  multivariate  realm  and  are  effective  in 
examples  such  as  this. 

An  important  point  of  note  concerning  the  multivariate  formalism  introduced  in  this  subsection 
is  that  the  standard  error  of  the  first-order  moments  is  often  expressed  (e.g.,  Eq.  9  in  [15] ,  with 
Eq.  8  in  [15]  analogous  to  Eqs.  12  and  13b  above)  as  a  single  statistic,  viz,, 

v  =  E  p(y.i  ~  -  a)  (29) 

i=  1 
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in  which  y  denotes  the  NxP  matrix  of  all  test  exemplars.  This  statistic,  which  essentially  combines 
the  (kXi  k%)  =  (0, 1)  and  (fci,  k<i)  =  (1, 0)  rows  in  Table  5  into  a  single  statistic,  is  a  direct  measure 
of  how  many  “standard  errors  of  the  mean”  the  computed  sample  means  are  from  their  expected 
values  under  the  null  hypothesis.  Graphically  interpreted,  it  indicates  on  which  ellipse  in  Fig.  1  the 
computed  joint  sample  mean  lies. 

3.3  Generalization  to  Multiple  Test  Classes 

An  important  observation  by  Puri  [19]  concerning  model  validation,  in  general,  was  in  noting 
that  in  many  practical  applications,  it  may  be  useful  to  develop  validation  tests  for  models  that 
can  potentially  account  for  several  different  classes  of  test  samples.  Assigning  test  exemplars  to 
different  “classes”  is  useful  and  appropriate  if  the  various  test  exemplars  differ  a  priori  in  some 
identifiable  manner  (e.g.,  batches  of  data  collected  under  different  experimental  conditions)  that 
can  be  summarized  by  a  set  of  “inputs,”  or  control  knobs  {x^i, . . . ,  x^q}  in  which  there  are  Q  input 
parameters  characterizing  each  exemplar.  Within  each  class  (i.e.,  a  batch  of  test  data  all  having 
the  exact  same  input  vector),  it  is  assumed  that  the  exemplars  are  i.i.d.  If  the  input  variables 
Xi  were  the  same  for  each  i  (i.e.,  effectively  ignorable),  the  problem  reduces  to  one  of  classical 
homogeneity  testing  between  the  test  data  and  a  single  simulation  model§  designed  to  account  for 
all  of  the  test  exemplars,  i.e.,  determining  whether  or  not  the  two  sets  could  have  emerged  from 
the  same  probability  distribution.  A  number  of  well-known  nonparametric  tests  can  be  applied  to 
this  problem  (cf.,  [16]).  The  problem  is  somewhat  more  complicated,  however,  if  there  exist  two  or 
more  test  classes,  in  which  case  the  simulation  model  output  is  different  for  each  class,  since  the 
Xj’s  enter  the  model  as  a  set  of  parameters.  Each  class  is,  in  effect,  a  “special  case”  that  requires 
generating  a  simulation  sample  tailored  to  it.  If  there  are  C  classes,  the  model  validation  problem 
becomes  one  of  validating  C  null  hypotheses  simultaneously,  viz., 

Me  Tc  for  all  c  =  1, . . . ,  C.  (30) 

Since  it  is  intended  that  the  larger  model  account  for  all  C  classes,  an  overall  assessment  of  the 
model  validity  is  based  on  how  well  the  class- wise  null  hypotheses  hold  up  on  average.  Puri  proposes 
one  such  class  average,  viz., 

t4=4E  zk,c  (31) 

C  c—1 

in  which  ^,c  =  (™fc,c  “  ^fc,c)/S/c,c>  with  M/.)C  and  Sk,c  specific  to  class  c.  In  [20],  this  is  written 
as  U\  —  Ya=i  Ui  Us  =  Ya=i  etc->  which  n  and  i*  correspond  to  C  and  z^c  respectively  and 
division  by  n  is  omitted. 

4  Multivariate  Hybrid  Tests 

Another  theme  common  to  several  of  the  nonparametric  tests  that  have  been  proposed  ([5, 
15,  20])  is  rank  transforms.  We  herein  refer  to  these  tests  as  “hybrid  tests”  because  they  involve 
computing  moments  of  rank- transformed  distributions.  Alternative  tests  that  we  describe  later  also 
involve  rank  transforms,  but  do  not  involve  computation  of  moments. 

§A  single  model  having  different  parameter  settings,  corresponding  to  different  operating  conditions  Xi,  is,  for 
purposes  herein,  considered  to  represent  a  different  simulation  model. 
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In  the  univariate  domain,  the  rank-transform  TZ  =  {ri,...,rjv}  of  T  with  respect  to  M ,  is 
defined  such  that 

rvi 

n=  I  P(y)  dy  (32) 

J—OO 

where  P(y)  is  the  probability  distribution  of  M.  The  integral  expression  in  Eq.  32  means  that 
if  a  very  large  number  of  simulation  exemplars  is  generated,  r*  is  equal  to  the  percentage  of  such 
exemplars  whose  value  is  less  than  In  the  limit  of  infinitely  many  simulation  exemplars, 
Vi  asymptotically  approaches  the  value  of  the  integral  expression  in  Eq.  32.  If  the  simulation 
model  is  sufficiently  tractable  mathematically  that  P(y )  can  be  computed  analytically,  the  rank 
transformation  can  be  effected  by  direct  evaluation  of  the  integral.  Otherwise,  it  is  necessary  to 
generate  a  very  large  number  of  simulation  exemplars  and  count  the  number  of  exemplars  less  than 
each  yi. 

The  most  important  property  of  71  is  that  it  represents  a  uniform  distribution  over  the  unit 
interval  (0,1)  if  and  only  if  M  T.  We  will  denote  this  distribution  as  i/(0, 1).  The  rank 
transformation  maps  the  correspondence  problem  from  one  domain  to  another.  Instead  of  testing 
the  hypothesis  M  <-— ►  T  directly,  we  test  the  hypothesis  U( 0,1)  ►  1Z.  The  U( 0, 1)  character 

of  7Z  should  emerge  regardless  of  the  functional  form  of  P(y).  This  transforms  the  problem  to  a 
canonical  form  that  can  be  treated  on  a  more  unified  footing. 

Hybrid  tests  involve  the  application  of  moment-based  hypothesis  testing,  where  the  null  hy¬ 
pothesis  is  U{ 0, 1)  ►  7Z.  Just  as  sample  moments  were  computed  and  evaluated  vis-a-vis  Mk  and 

Sk  in  the  preceding  section,  in  which  a  Gaussian  distribution  was  assumed,  hybrid  tests  similarly 
involve  computation  of  the  sample  moments  of  TZ  and  their  evaluation  vis-a-vis  Mk  and  S&,  but 
with  respect  to  a  U{ 0, 1)  distribution.  The  V  and  V*  quantities  that  Reynolds  introduces  (see  [20]) 
are  analogous  to  U  and  J7*,  except  that  they  represent  moments  of  a  uniform  (rank-transformed) 
distribution. 

5  Partial  Multivariate  Rank  Transformations 

Extending  the  univariate  rank  transform  concept,  as  described  above,  to  the  multivariate  do¬ 
main,  is  not  a  simple  matter.  In  this  section,  we  describe  and  compare  two  methods  that  have  been 
proposed  for  effecting  rank  transformations  in  the  multivariate  domain.  In  [5],  Brodeen  and  Taylor 
propose  one  such  partial  multivariate  rank  transform.  A  related  test  was  developed  independently 
in  [15].  In  the  following  subsections,  we  clarify  the  distinction  between  the  two  approaches.  We 
refer  to  these  techniques  as  “partial”  rank  transforms  because,  even  though  they  perform  a  (uni¬ 
variate)  sort  separately  on  each  variate,  they  search  for  uniformity  only  with  respect  to  a  single 
variate.  This  contrasts  with  complete  multivariate  rank  transforms,  which  we  introduce  in  the  next 
section. 

5.1  Hybrid  Test  Method  1 

In  reference  to  Eq.  8  of  [5],  Brodeen  and  Taylor  seek  to  ascertain  whether  a  set  of  C  test 
classes,  {3c}c€i,...,C>  are  all  in  correspondence  with  one  another,  i.e.,  all  characterized  by  a  common 
p.d.f.  and  simulation  model,  AT  If  Tc  is  expressed  as  a  Nc  x  P  matrix  with  elements  yi#,c,  where  Nc 

*  Elsewhere  in  the  literature,  including  [15],  division  by  N  is  omitted. 
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is  the  number  of  exemplars  in  Tc  and  P  is  the  number  of  variates,  Brodeen  and  Taylor  amalgamate 
the  7^’s  vertically  into  a  N  x  P  matrix,  where  N  =  Ylc=i  Nc-  For  each  test  class,  they  examine  the 
distribution  of  the  variate- wise  ranks  which  is  the  percentage  of  exemplars  in  the  amalgamated 
set  whose  p’th  variate  is  less  than  yi,c,p*  If  the  null  hypothesis  T\  <-2-+  T2  M 

is  valid,  it  follows  that  the  set  { riiPiC}ieTc  should  be  distributed  uniformly  on  the  unit  interval  for 
each  c  and  p.  In  other  words,  each  test  class  should  mix  uniformly  among  all  of  the  other  classes 
on  a  variate- wise  basis. 

It  is  noted  that  the  variate- wise  uniformity  of  r^P}C  for  each  c  is  a  necessary,  but  not  sufficient, 
condition  for  the  null  hypothesis  to  be  vindicated.  Brodeen  and  Taylor  proceed  to  test  for  variate- 
wise  uniformity  by  computing  the  variance  statistic 


which  is  analogous  to  Eq.  29  herein  except  that  the  rank-transformed  set,  7£,  has  been  used  in 
place  of  the  model  outputs  themselves.  Since  each  {n,p,c}iei,...,;vc  is  hypothesized  to  be  uniform 
on  (0, 1),  the  means  of  the  rank-transformed  distributions  should  all  be  1/2.  a  in  Eq.  33  denotes 
the  variance  of  the  simulation  output,  <S,  rank-transformed  with  respect  to  M ,  where  <S  is  a  large 
simulation  run  generated  from  M  itself. 

In  applying  this  test,  Brodeen  and  Taylor  treat  only  the  case  of  C  =  1,  i.e.,  one  class  at  a 
time.  In  doing  so,  they  generated  a  simulation  sample,  <SC,  for  each  c  (corresponding  to  one  of 
several  different  possible  operating  conditions  for  a  communication  network),  amalgamated  Sc  and 
TCl  the  simulation  model  output  vectors  and  measured  test  data  vector,  respectively,  and  computed 
the  resulting  r^p s.  They  computed  the  variance  statistic  for  each  {riiPiC}ieii...yNc  set  to  test  f°r 
variate- wise  uniformity  in  the  test  class. 

5.2  Hybrid  Test  Method  2 

The  essential  difference  between  the  method  that  we  introduced  in  [15],  and  the  Brodeen  and 
Taylor  approach  described  above,  is  chiefly  a  matter  of  how  classes  are  interpreted  and  treated.  In 
[5],  the  problem  at  hand  involves  one  or  more  test  samples  that  are  conjectured  to  be  representations 
of  a  single  surmised  simulation  model.  The  validation  method  employed  in  [5]  is  then  one  of  rank¬ 
transforming  each  class  with  respect  to  that  simulation  model  and  computing  Vc  for  each  class. 
In  the  approach  of  [15],  we  were  concerned  with  a  model  validation  scenario  in  which  C  different 
simulation  models,  {A4c}cei,...,c>  were  available  to  account  for  identifiable  a  priori  differences  in 
the  various  classes  (e.g.,  different  operating  scenarios  in  a  communication  network).  For  each  class, 
we  generated  a  simulation  set,  <SC,  and  computed  ac  (written  as  Q  in  [15] )  as  the  variance  of  5C 
rank-transformed  with  respect  to  Mc •  For  each  c,  we  computed  a  Vc  statistic  identical  to  that  in 
Eq.  33,  except  that  ac  appears  in  our  formulation  in  lieu  of  <7.  We  then  average  the  VqS  over  all  of 
the  classes  to  obtain  an  overall  variance  statistic  that  indicates  how  well,  on  average,  AAC  accounts 
for  Tc.  The  formulation  of  [15]  enables  one  to  handle  the  extreme  case  in  which  all  of  the  test 
exemplars  are  taken  under  a  different  operating  condition. 

To  test  the  performance  of  the  simulation  model  validation  algorithm  of  [15]  in  an  application 
example,  we  applied  it  to  the  bivariate  AR  validation  scenario  described  in  3.2.  This  problem 
involved  a  surmised  model  with  U(— 1, 1)  noise  channels  and  test  samples,  T\  and  T2  (N  =  10,000 
exemplars  each),  generated  using  U(— 1,1)  and  Af( 0,  l/\/3)  noise  respectively.  C  =  1  in  this 
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example.  The  expected  value  of  V,  under  the  null  hypothesis,  is  unity.  The  computed  values 
were  0.9965  and  0.9181  for  7i  and  T2  respectively.  The  corresponding  confidence  statistics,  based 
on  standard  errors  of  y/2/N:  were  0.2475  and  5.7912.  Based  on  these  confidence  statistics,  the 
algorithm  is  effective  in  rejecting  M  < — ►  72,  and,  appropriately,  does  not  reject  M  71. 


6  Complete  Multivariate  Rank  Transformations 


The  partial  multivariate  rank  transformation  methods  such  as  those  described  above  do  not  map 
the  simulation  model  p.d.f.  onto  a  uniform  distribution  in  P  dimensions.  Nor  do  they  represent  true 
generalizations  of  the  rank  transform  methodology  to  the  multivariate  domain.  Rather,  they  effect 
univariate  transformations  on  a  variate-wise  basis.  As  a  result,  they  may  fail  to  detect  couplings 
in  the  model  outputs.  As  an  example  of  how  they  could  fail,  consider  the  bivariate  probability 
distribution 

P(x ,  y)  =  2x  +  2y  —  4  xy  (34) 

which  is  such  that  the  strip  integrals  in  both  directions,  fg  P(x,y)  dx  and  fg  P(x:y)  dy,  are  inde¬ 
pendent  of  y  and  x  respectively.  Since  this  p.d.f.  is  invariant  under  the  variate-wise  rank  transfor¬ 
mation,  the  partial  rank  transform  tests  would  fail  to  distinguish  it  from  a  uniform  distribution  on 
the  unit  square. 

In  this  section,  we  propose  a  complete  multivariate  rank  transform  that  does  map  an  arbitrary 
P-dimensional  p.d.f.  onto  a  true  multivariate  uniform  distribution.  For  a  multivariate  model,  M. 
characterized  by  a  p.d.f.  P(yi, . . . ,  yp )  and  a  test  sample  T  =  the  rank  transform,  7 Z, 

of  T  with  respect  to  AA  is  defined  as  the  set  where  ri  =  (r^i, . . .  yT^p)  is  the  P  x  1 

column  vector  such  that 

/Vi,  1  roo  roo 

/  •••/  P{yi,---,yp)dyi---dyP  (35a) 

-CO  7-00  7  —  00 

[Vi,  2  o  roo 

n,  2  =  /  /  •••/  P(y2,---,xp\yiii)dy2---dyp  (35b) 

7—oo  7—oo  7—oo 


P(yp\Vi,h 


■  • )  yi,P—i)  dyp. 


(35c) 


The  integral  in  Eq.  35a  is  computed  by  integrating  the  variable  y\  from  —  oo  to  y^\  and  all  of 
the  P  —  1  remaining  y  variables  from  — oo  to  oo.  An  elaborate  definition  of  the  multivariate  rank 
transform  is  necessary  to  account  for  correlations  among  the  P  variables  in  the  output  distribution. 
If  the  null  hypothesis  Ai  <-— >  T  is  valid,  it  follows  that  U( 0,l)p  7Z  ,  where  Z7(0, 1)^  is  the 
P-dimensional  analog  of  the  unit  interval,  i.e.,  the  set  of  all  ordered  P-tuples  such  that  the  value 
of  every  component  is  greater  than  zero,  but  less  than  unity.  For  the  bivariate  case  (P  =  2),  for 
example,  Z7(0,  l)2  is  the  uniform  distribution  on  the  unit  square  in  the  two-dimensional  xy-plane, 
the  probability  density  for  which  is 


P{x,y) 


1  if  0  <  x  <  1  and  0  <  y  <  1 
0  otherwise. 


(36) 


Alternatively,  the  bivariate  rank  transform  may  be  interpreted  as  the  inverse  of  a  filtering  operation 
that  takes  a  pair  of  independent  U{ 0,1)  random  number  generators  and  transforms  the  resulting 
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stream  of  numbers  to  emulate  the  output  distribution  of  the  model  in  question.  If  the  surmised 
model  is  in  fact  valid,  one  will  obtain  a  U{ 0,  l)p  distribution  irrespective  of  the  ordering  of  the 
variate  through  which  the  integrals  in  Eqs.  35  are  evaluated.  An  important  advantage  of  the 
complete  multivariate  rank  transform  is  that  it  maps  the  model  output  into  a  canonical  form  (viz., 
U(Q,  l)p),  for  which  the  moments  can  be  found  from  a  table  lookup  (whereas  a  has  to  be  computed 
for  each  individual  case  in  the  partial  rank  transforms). 

To  demonstrate  the  application  of  the  complete  multivariate  rank  transform,  we  selected,  as 
our  hypothesized  model,  M ,  the  output  of  the  same  bivariate  AR  model  in  the  preceding  section, 
but  with  an  independent  pair  of  U(— 1,1)  distributions  as  the  noise  channels.  The  test  samples 
were  derived  from  the  same  AR  process,  but  with  an  independent  pair  of  U(— 1, 1)  and  Af( 0,  l/\/3) 
distributions,  respectively,  in  the  noise  channels.  Both  test  samples  were  of  size  N  =  10, 000.  Thus, 
we  should  expect  the  hypothesis  M  T\  to  be  vindicated,  but  the  hypothesis  M  >  T2  to  be 
refuted. 

Since  the  noise  inputs  in  the  model  M  are  non-Gaussian,  the  probability  distribution  of  M 
cannot,  as  far  as  we  know,  be  determined  analytically,  unlike  in  all  of  the  preceding  cases  treated 
thus  far.  Thus,  it  is  not  possible  to  compute  the  rf  s  by  appealing  to  an  analytic  description  of 
the  model  output.  The  probability  distribution  characterizing  M  can  only  be  ascertained  (approx¬ 
imately)  by  generating  a  large  simulation  sample,  S .  In  accordance  with  Eq.  35a,  r;q  is  defined 
operationally  as  the  percentage  of  simulation  exemplars  in  S  whose  first  component  is  less  than 
Xit  1.  Since  it  is  not  possible  operationally  to  compute  the  rank  transform  for  7*^2  >  as  in  Eq.  35b, 
with  a  simulation  sample,  *S,  of  finite  size,  it  is  necessary  to  make  approximations.  We  generated  a 
simulation  sample,  S ,  of  500,000  exemplars  using  bivariate  U(— 1, 1)  noise  and  computed  the  rank 
transforms,  TZ\  and  7^2*  of  T\  and  T2  respectively  vis-a-vis  <S  through  the  following  steps.  The 
output  space  was  partitioned  into  a  10  x  10  grid,  as  shown  in  Fig.  2.  The  shaded  strip  pertains 
to  those  test  and  simulation  exemplars  for  which  the  computed  value  lies  in  the  third  decile, 
viz.,  0.2  <  r^i  <  0.3.  A  simulation  exemplar  is  assigned  to  the  strip  if  the  x\  value  of  that  ex¬ 
emplar  lies  in  the  third  decile  with  respect  to  the  entire  simulation  sample,  S.  A  test  exemplar  is 
assigned  to  the  strip  if  between  20%  and  30%  of  simulation  exemplars  assume  smaller  x\  values. 
The  finite  width  of  this  strip  provides  a  sufficiently  large  subset  of  exemplars  (on  average,  1/10  of 
N  =  10,000,  or  1,000)  such  that  the  distribution  of  X2  values  within  this  strip  can  be  observed. 
For  all  points  in  the  shaded  strip,  r^i  was  “snapped”  to  0.25,  corresponding  to  the  aq-centroid  of 
the  strip,  r^ 2  for  each  test  exemplar  in  the  strip  was  computed  by  counting  the  percentage  of  sim¬ 
ulation  exemplars  in  the  strip  that  assume  smaller  X2  values.  In  doing  so,  the  resulting  r*} 2  values 
are  snapped  to  the  a^-centroid  values  of  the  corresponding  deciles.  The  resulting  set  of  (^1,7^2) 
values  serves  as  a  good  working  approximation  of  the  rank  transform  that  would,  in  principle,  be 
obtained  from  Eqs.  35  were  a  virtually  infinite  simulation  sample,  <S,  obtainable  practically.  As 
the  number  of  simulation  exemplars  increases,  the  grid  mesh  can  be  made  finer,  and  thus  a  more 
accurate  approximation  of  the  exact  rank  transform  can  be  obtained. 

A  graphical  plot  of  the  rank-transformed  distributions  are  shown  in  Fig.  3,  from  which  it  is 
evident  visually  that  the  points  in  the  plot  for  T\  (on  the  left)  appear  to  be  more  uniformly 
dispersed  than  in  the  plot  for  T2  (on  the  right).  In  the  latter,  there  is  a  preponderance  of  points  in 
the  center. 
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yi 


Figure  2:  First  Steps  in  Computing  a  Rank  Transform  by  Cutting 
Strips 


Figure  3:  Rank-transforms  of  T\  (left)  and  (right) 


With  the  approximate  rank  transforms,  7^i  and  .7^2,  computed  thusly,  sample  moments  of 
TZ\  and  7^2  were  computed.  The  results  are  displayed  in  Table  6.  For  each  sample  moment, 
mkuk2,  the  ^-statistic,  zklM  =  {mklM  -  MklM)/Skuk2  is  provided,  where  Mku k2  =  /J,klik2  and 
SkiM  —  akifa/y/N  axe  computed  analytically.  The  results  for  the  first  few  moments  axe  given  in 
Table  7.  The  confidence  statistic,  akljk2  =  Erf  {zklik2),  is  defined  as  in  the  earlier  moment-based 
tests. 
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Table  6:  Hybrid  Tests  for  AR  Model  Driven  by  Gaussian  Noise 
(left)  and  Uniform  Noise  (right) 


k\ 

zk\,k2 

10  x  10 

50  x  50 

2/1 -cut 

t/2-cut 

2/i-cut 

2/2-CUt 

i 

0 

0.3291 

1.7251 

0.1732 

1.8041 

0 

i 

1.5935 

0.0624 

1.8755 

0.1891 

2 

0 

-0.0856 

1.3016 

0.0273 

1.6871 

1 

i 

1.3101 

1.2219 

1.3790 

1.3583 

0 

2 

1.2023 

-0.2969 

1.7785 

0.0545 

-0.2635 

1.0130 

-0.0152 

1.5844 

0.7158 

0.9452 

0.9085 

1.3200 

1.0052 

0.6738 

1.3824 

0.9520 

0.9358 

-0.4277 

1.6844 

0.0409 

4 

0 

-0.4214 

0.7230 

-0.0173 

1.4985 

3 

1 

0.3736 

0.6698 

0.6448 

1.2213 

2 

2 

0.5766 

0.5620 

0.9902 

0.9950 

1 

3 

0.7167 

0.4033 

1.2967 

0.7439 

0 

4 

0.6655 

-0.5546 

1.6086 

0.0704 

zk\  ,k2 

10  x  10 

50  x  50 

2/1 -cut 

2/2-cut 

2/i-cut 

2/2-cut 

-0.8799 

3.0415 

-1.0905 

3.0844 

3.0068 

-1.0115 

3.0567 

-1.1702 

-5.2946 

1.4271 

-4.6738 

1.8842 

2.1589 

1.9358 

1.4823 

1.9431 

1.3566 

-5.4260 

1.9866 

-4.7855 

-7.8424 

0.3607 

-6.7373 

1.0707 

-1.9633 

1.0890 

-2.1226 

1.4105 

1.2895 

-2.1599 

0.9772 

-1.5809 

0.2649 

-7.9786 

1.2652 

-6.8348 

-9.1501 

-0.3418 

-7.5150 

0.6699 

-4.6770 

0.2437 

-4.5128 

0.7671 

-1.8651 

-2.0477 

-1.9431 

-1.3851 

0.4318 

-4.7686 

0.4337 

-3.7780 

-0.4503 

-9.2963 

0.9357 

-7.5887 

Table  7:  Moments  of  Bivariate  N(0, 1)  Distribution 


h 

k2 

ftkifo 

& ki  ,&2 

l 

0 

1/2 

7Vi2 

2 

0 

1/3 

74/45 

1 

1 

1/4 

77/144 

3 

0 

1/4 

79/112 

2 

1 

1/6 

77/180 

4 

0 

1/5 

716/225 

3 

1 

1/8 

743/1344 

2 

2 

1/9 

756/2025 

5 

0 

-1/6 

725/396 

4 

1 

1/10 
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6 

0 
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5 

1 
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4 

2 

1/15 
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3 

3 

1/16 
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Table  6  tabulates  the  z-statistic  values  that  are  obtained  for  T\  and  T2  under  four  different  alterna¬ 
tive  approximation  methods  for  computing  the  rank  transform.  Whereas  the  preceding  discussion 
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appealed  to  a  10  x  10  grid  for  illustration  purposes,  a  different  mesh  size,  such  as  50  x  50,  is 
equally  admissible.  Results  for  these  two  alternative  grid  sizes  are  presented.  Computing  the  rank 
transform  by  cutting  strips  along  y<i  first  is  as  valid  as  taking  cutting  strips  along  y\  first;  results 
for  these  two  alternative  procedures  are  also  tabulated  in  Table  6  and  referred  to  as  “7/2-cut”  and 
“yi-cut.” 

The  z-statistics  in  Table  6  exhibit  consistently  moderate  values  for  T\  both  for  10  x  10  and 
50  x  50  meshes.  The  salient  results  are  insensitive  to  whether  the  yi-cut  or  y2-cut  method  is 
selected.  For  72,  on  the  other  hand,  a  disproportionate  number  of  z-statistics  in  all  four  scenarios 
lie  beyond  three  standard  errors.  These  results  suggest  that  M  ►  T2  be  rejected  based  on 
the  scope  of  the  multivariate  hybrid  test,  and  (correctly),  that  M  T\  not  be  rejected.  It 
is  especially  remarkable  that  good  agreement  between  the  computed  and  expected  moments  of 
the  rank-transformed  distributions  for  T\  was  obtained  for  both  the  10  x  10  and  50  x  50  meshes. 
This  speaks  favorably  to  the  robustness  not  only  of  the  complete  multivariate  rank  transformation 
method,  but  also  of  the  partition  method  through  which  the  rank  transform  was  computed. 

7  Multinomial  Tests 

In  this  section,  we  propose  a  multivariate  nonparametric  test  that  involves  partitioning  the 
rank-transformed  set  into  bins  and  working  directly  with  the  numbers  in  those  bins,  rather  than 
computing  moments.  The  method  is  also  extremely  pertinent,  as  will  be  demonstrated,  for  scenarios 
in  which  the  model  output  is  discrete  rather  than  continuous.  As  in  the  x2  test,  this  approach  in¬ 
volves  partitioning  the  rank-transformed  set,  7 Z,  into  bins,  and  evaluating  the  discrepancies  between 
the  observed  and  expected  numbers  of  observations  per  bin. 

The  x2  and  Kolmogorov-Smirnov  (KS)  tests  both  appeal  to  x2  distributions  to  compute  con¬ 
fidence  statistics.  The  implicit  assumption,  however,  is  that  the  asymptotic  conditions  for  appli¬ 
cation  of  the  CLT  are  satisfied.  This,  however,  is  not  always  the  case.  To  illustrate  the  point,  we 
show  herein  how  the  x2  test  yields  an  erroneous  confidence  statistic  in  the  simple  scenario  of  a 
coin-flipping  experiment. 

Suppose  that  a  fair  coin  is  flipped  100  times  and  that  55  heads  are  obtained.  We  wish  to  reach 
an  informed  impression,  based  on  the  results  of  this  test,  as  to  whether  the  coin  is  fair.  This 
means  computing  a  confidence  statistic  which,  in  this  case,  is  equal  to  the  probability  of  obtaining 
fewer  than  55  heads.  We  note  that  the  bin  counts  (i.e.,  number  of  heads  and  number  of  tails)  are 
distributed  as  a  binomial  distribution,  viz,, 

^  (37) 

in  which  N  =  100  is  the  total  number  of  flips,  n#  is  the  number  of  heads  obtained,  np  =  N  —  uh 
is  the  number  of  tails,  and  Ph  =  1/2  and  pp  =  l  —  pH  =  1/2  are  the  probabilities  of  a  single  flip 
yielding  heads  or  tails  respectively.  The  functional  form  of  P{uh)  is  illustrated  in  Fig.  4. 

The  confidence  statistic,  a,  is  therefore  equal  to  the  P(n# )  summed  from  n#  =  0  to  54.  The  exact 
result  is  a  =  0.8159. 

The  x2  test,  by  contrast,  yields  a  confidence  statistic  based  on  the  quantity 

2  =  (Oh  -  Eh ?  (Or  -  Ejf  =  (55  -  50)2  (45-  50)2 

X  Eh  +  Et  50  +  50 


1.00  (38) 
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Figure  4:  Probability  of  Obtaining  tih  Heads  on  100  Coin  Flips 


in  which  Oh  =  55  and  Op  =  45  are  the  observed  numbers  of  heads  and  tails,  and  Eh  =  Ep  =  50 
are  the  expected  counts.  Since  there  are  G  =  2  bins  in  this  case,  the  confidence  statistic  may 
be  obtained  by  evaluating  the  definite  integral  a  =  Jq  fu(v)dv  analytically,  where  fu  denotes 
the  standard  x2  function  with  v  =  G  —  1  =  1  degrees  of  freedom  and  x2  =  1.  One  obtains 
a  =  0.6827,  which  differs  significantly  from  0.8159.  This  illustrative  example  thus  indicates  that 
the  x2  test,  although  generally  very  effective  in  determining  “goodness  of  fit,”  does  not  necessarily 
yield  accurate  confidence  statistics.  The  x2  test  will  yield  accurate  a  values  only  if  the  asymptotic 
conditions  of  the  CLT  are  satisfied. 

The  KS  test  also  proves  unsatisfactory  in  the  context  of  this  simple  coin-flip  scenario,  but 
for  entirely  different  reasons.  KS  applies  easily  only  to  cases  in  which  the  output  of  the  stochastic 
model  in  question  is  continuous ,  as  opposed  to  discrete .  The  rank-transformed  set,  TZ ,  must  assume 
essentially  a  continuum  of  values  so  that  the  quantity  \r[  —  i/N |  can  be  computed  meaningfully. 
Furthermore,  the  KS  test  cannot  be  extended  to  the  multivariate  domain,  since  there  is  no  satis¬ 
factory  way  of  generalizing  the  notion  of  a  cumulative  distribution  function  to  probability  density 
functions  of  two  or  more  variables. 

These  .shortcomings  of  the  x2  and  KS  tests  motivate  the  development  of  an  alternative  non¬ 
parametric  test.  The  binomial  distribution  analysis  above  is  the  essence  of  such  an  alternative 
technique  that  we  herein  advocate  and  will  refer  to  as  the  multinomial  test.  This  test  seeks  to 
answer  the  following  question:  Suppose  that  we  have,  a  distribution  consisting  of  G  >  2  bins  and 
n\  observations  in  the  first  bin,  n<i  in  the  second,  ...,  uq  in  the  G’th  bin.  This  could  represent 
either  the  output  distribution  of  a  discrete-output  stochastic  process  or  a  partitioning  of  the  rank- 
transformed  output  of  a  continuous-output  process  into  G  cells.  Does  this  distribution  represent  a 
uniform  distribution  in  which  there  are  N/G  observations  in  each  bin,  where  N  =  Y^=i  ng  the 
total  number  of  observations? 

The  multinomial  test  models  the  bin  counts  in  such  scenarios  as  a  multinomial  distribution 
of  dimension  G,  which  states  that  if  N  observations  are  to  be  distributed  among  G  bins,  the 
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probability  of  finding  n\  observations  in  the  first  bin,  712  in  the  second,  etc.  is  equal  to 


P(ni,...,nG) 


N\ 


nil 


• nG 


n  1 

TPi 


■  -r)nG 

Pg 


(39) 


where  N  =  ng  is  the  total  number  of  incidents  and  pg  is  the  probability  of  any  particular 

incident  falling  into  the  g' th  bin.  For  rank-transformed  sets  in  continuous-output  stochastic  pro¬ 
cesses,  the  bin  occupancy  probabilities  are  equal,  viz.,  pg  =  l/G  ,  for  g  =  1, . . . ,  G.  The  expected 
value  of  ng  is  clearly  equal  to  Npg  —  N/G. 

The  multinomial  test  is  concerned  with  whether  the  bin  counts  in  toto  deviate  significantly 
from  the  most  probable  scenario  in  which  there  are  equal  numbers  of  observations  in  each  bin.  The 
technique  that  we  propose  herein  is  to  compute  the  log-likelihood  statistic 


A  =  -  In 


P(m,...,nc) 

P(N/G,...,N/G)m 


(40) 


in  which  P(ni, . . .  ,nG)  and  P(N/G , . .  *,N/G)  are  computed  as  in  Eq.  39.  For  given  N  and  G 
values,  the  A  statistic  is  characterized  by  a  well-defined  distribution  whose  functional  form  can  be 
ascertained  (approximately)  through  simulation.  For  example,  the  A  distribution  for  N  =  100, 
G  —  2  can  be  examined  by  performing  a  large  number  of  simulation  runs  in  which  a  coin  is  flipped 
100  times.  In  each  such  run,  the  number  of  heads  is  counted  and  the  corresponding  A  value 
recorded.  After  a  large  number  of  such  100-flip  runs,  the  A  distribution  emerges. 

The  multinomial  test,  like  the  x 2  test,  *s  extensible  readily  to  the  multivariate  domain.  The 
mathematical  formulation  is  no  different  from  in  the  univariate  domain,  and  is  simply  a  matter  of 
working  with  a  larger  number  of  cells.  In  Section  6,  we  sought  to  test  the  hypotheses  M  < — >  T\ 
and  M  72  by  partitioning  the  rank-transformed  sets  into  10  x  10  or  50  x  50  arrays.  The 
bin  counts  for  the  10  x  10  partition  for  T\  and  T2  are  provided  in  Table  8.  These  are  simply  the 
number  of  exemplars  that  are  snapped  onto  each  cell  centroid  in  the  rank  transform  computation 
algorithm  introduced  in  the  preceding  section.  The  upper  (lower)  two  tables  are  for  2/1-cut  (2/2-cut) 
rank  transforms.  The  second  column  of  the  first  row  of  the  upper  left-hand  table,  for  instance, 
indicates  that  for  T\,  there  were  111  incidents  for  which  0.1  <  <  0.2  and  0  <  r^2  <  0.1.  It  is 


Table  8:  Bin  Counts  for  T\  (left)  and  (right) 
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evident  visually,  from  casual  inspection  of  the  two  sets  of  numbers,  that  the  numbers  in  the  tables 
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on  the  left  are  more  uniform  and  closer  to  the  expected  value  of  100  than  those  in  the  table  on  the 
right. 

For  both  the  10  x  10  and  50  x  50  arrays,  we  computed  the  multinomial  probability  expressions 
in  Eqs.  39  and  40.  The  A  scores  are  tabulated  in  Table  9.  Upon  simulation  of  the  A  distribution  for 
(N  =  10, 000,  G  =  100)  with  100  runs,  the  mean  and  root-variance  of  the  distribution  appear  to  be 
approximately  50  and  7  respectively,  which  indicate  a  very  good  confidence  statistic  for  M  <-2->  7j, 
but  an  outlandishly  large  one  for  M  72.  Based  on  the  scope  of  the  multinomial  test,  we  can 
therefore  reject  the  latter  hypothesis,  but  the  former  hypothesis,  i.e.,  M  T\,  for  10  x  10  bins, 
can  clearly  not  be  rejected  based  on  the  scope  of  the  multinomial  test.  The  same  conclusions  follow 
from  a  50  x  50  partition,  where  a  simulation  of  the  A  distribution  for  ( N  =  10,000,  G  =  2,500) 
exhibit  a  mean  between  1,150  and  1,200  (see  Table  9)J 


Table  9:  A  Statistics  for  T\  (top)  and  72  (bottom) 
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7.1  Application  of  Multinomial  Test  to  Bivariate  Poisson  Process 

The  log-likelihood  statistic  from  the  multinomial  test  offers  an  extremely  useful  and  powerful 
technique  for  testing  rank-transformed  distributions  for  uniformity.  The  log-likelihood  approach 
is  also  applicable  to  other  types  of  discrete-output  models  because  it  is  geared  inherently  toward 
counting  discrete  events  in  bins.  As  an  example,  we  generated  a  bivariate  Poisson  process  with 

=  0.001  and  A b  —  0.003  (i.e.,  A^  and  A b  were  the  respective  probabilities  of  a  type  A  or  type 
B  event  in  a  given  time  step),  and  ran  it  for  100,000  time  steps. 

It  is  well  known  that  the  probability  of  exactly  Na  type  A  events  and  Nq  type  B  events  occurring 
in  a  duration  of  T  time  steps  is  P(Na)P{Nb),  where 

p(na)  =  -fi~r-e~XAT  (41a) 

p(nB)  =  (tr  ~e~w  (4ib) 

11  Phase  2  will  focus  on  generating  confidence  statistics  for  A  distributions  in  a  more  accurate  and  thorough  manner. 
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A  log-likelihood  score,  A,  can  be  computed  as 


•  P(Na)P(Nb )  ' 

P(XaT)P(XbT). 


(42) 


where  A aT  and  A qT  are  the  expected  number  of  type  A  and  type  B  events. 

By  generating  a  large  number  of  simulation  runs,  the  distribution  of  A  can  be  observed.  Fig.  5 
shows  a  histogram  plot  for  the  distribution  based  on  a  simulation  sample  of  200  exemplars.  By 
comparing  the  output  of  a  particular  test  case  (i.e.,  numbers  of  type  A  and  type  B  events)  to  the 
distribution,  a  confidence  statistic  can  be  obtained  and  the  validity  of  the  surmised  Poisson  model 
assessed. 


Figure  5:  Simulated  Distribution  of  A  Values  for  Poisson  Process 


8  Conclusions 

In  this  Phase  I  study,  we  investigated  several  fruitful  methods  of  developing  nonparametric  tests 
to  validate  multivariate  stochastic  models.  We  began  by  examining,  comparing,  and  reconciling  the 
methods  proposed  earlier  in  [5]  and  [15].  Both  of  these  methods  can  be  classified  as  partial  rank 
transformations,  in  that  they  effect  essentially  univariate  rank  transforms  on  a  variate- wise  basis. 
Both  methods  then  test  for  uniformity  in  the  resulting  rank  transformed  sets  by  evaluating  their 
moments.  The  methods  differ  chiefly  in  terms  of  how  they  interpret  and  treat  different  operating 
scenarios  for  the  test  and  model  data. 

Herein,  we  showed  how  to  construct  a  complete  multivariate  rank  transform  that  maps  the 
model  output  onto  a  uniform  distribution  in  P  dimensions.  This  method  is  a  true  generalization  of 
the  univariate  rank  transform  to  the  multivariate  domain  and  provides  a  robust  test  for  detecting 
dependencies  among  the  variates;  it  also  converts  the  output  distribution  into  a  canonical  form, 
thus  obviating  the  need  to  compute  the  variance  and  higher-order  moments  for  each  and  every  class 
(i.e.,  operating  scenario)  or  model. 
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We  also  showed  that  there  are  two  basic  ways  of  determining  whether  the  model  output  (or 
a  rank  transformed  output)  conforms  to  the  distribution  predicted  by  the  null  hypothesis.  One 
approach  involves  computing  moments  and  ascertaining  whether  the  moment  values  for  the  test 
sample  and  simulation  model  output  are  in  good  agreement.  All  of  the  previous  attempts  to 
construct  nonparametric  multivariate  model  validation  methods  espouse  this  basic  strategy.  The 
alternative  approach  is  to  partition  the  output  distribution  into  bins  and  ascertain  whether  the 
observed  and  expected  bin  counts  are  in  good  agreement.  We  showed  how  the  multinomial  test 
methodology  provides  a  more  accurate  correspondence  assessment  than  the  x2  test  and  how  the  A 
statistic  serves  as  an  excellent  summary  indicator  of  overall  agreement.  Both  techniques  appear  to 
be  extremely  effective  for  continuous-output  models,  but  the  bin-partition  approach  is  especially 
suitable  for  discrete-output  models. 

9  SBIR  Phase  II  Effort 

In  Phase  II,  Barron  Associates,  Inc.  will  propose  undertaking  the  following  endeavors:  (1) 
assess  the  relative  performance  and  computational  complexity  of  the  various  algorithms  discussed 
herein  on  more  elaborate  synthetic  test  problems  than  was  possible  under  the  Phase  I  effort;  (2) 
explore  methods  for  developing  decision  algorithms  (e.g.,  thresholds)  to  achieve  certain  specified 
false-positive  and  false-negative  rates  in  various  scenarios;  and,  very  significantly,  (3)  apply  these 
methods  to  a  real-world  stochastic,  multivariate,  model  validation  problem.  Pursuant  to  the  second 
goal,  we  plan  to  explore  the  asymptotic  threshold  methodology  (based  on  the  sizes  of  the  test  and 
simulation  samples)  that  we  suggested  in  [15].  The  third  effort  will  shed  light  on  the  special 
considerations  and  intricacies  in  applying  these  mathematical  tools  to  practical  problems.  One 
such  pioneering  effort  in  this  direction  involving  forestry  data  was  reported  in  [20].  Although  this 
study  focused  on  univariate  techniques  only,  its  purpose  was  to  prove  the  efficacy  of  the  proposed 
methodology  in  a  situation  involving  real-world  empirical  data. 

As  an  example  of  a  real-world  stochastic,  multivariate,  model  validation  problem,  consideration 
might  be  given  to  assessing  an  Army  combat  radio  communications  network,  such  as  that  discussed 
in  [10],  for  which  a  simulation  program,  laboratory  experimental  data  (conducted  on  a  combat  radio 
network  at  the  Ballistic  Research  Laboratory),  and  benchmark  results  [5]  are  available.  (Note  that, 
regardless  of  the  choice  of  real-world  problem,  the  basic  model  validation  procedure  will  be  the 
same.)  The  purpose  of  the  communications  network  experiment  of  [10]  was  to  quantify  the  effect 
of  three  control  factors,  viz.,  message  length,  message  arrival  rate,  and  transmission  mode  (single 
channel  or  frequency  hopping),  on  network  performance  (defined  by  mean  throughput  and  delay). 
These  experiments  were  designed  specifically  to  enable  the  use  of  statistical  techniques  to  determine 
the  effect  of  the  various  control  factors  on  network  performance.  For  this  problem,  a  simulation  of 
the  communications  network  has  already  been  coded  using  the  commercially-available  OPNET  suite 
of  tools.  The  communications  network  simulation  problem  is  one  with  widespread  applicability, 
and  so  simulation  validation  for  this  problem  will  likely  be  of  considerable  interest  and  relevance 
to  the  Army,  as  well  as  others. 

As  an  alternative,  BAI  could  readily  acquire  other  simulation  software  and  test  data  in  related 
application  areas  (e.g.,  computer  and  communications  networks),  as  well  as  unrelated  application 
areas  (e.g.,  fixed-  and  rot  ary- wing  aircraft  flight  control  systems).  As  a  representative  example 
of  the  latter,  consider  the  validation  of  the  NASA  F-18  High  Alpha  Research  Vehicle  (HARV). 
At  the  NASA  Dryden  Flight  Research  Center  (DFRC),  a  six-degree-of- freedom,  nonlinear,  batch 
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simulation  represents  one  process.  The  other  process  is  a  hardware- in- the-loop  (HIL)  simulation, 
which  includes  the  F-18  mission  computer  and  flight  control  computer.  The  HIL  simulation  provides 
a  more  realistic  test  of  the  RFCS  control  law  implementation  for  performance  validation  of  the 
research  flight  control  system  (RFCS)  control  laws.  In  actuality,  neither  process  can  be  considered 
the  “true”  process,  as  both  simulations  have  simplifications  and  each  can  (and  has)  been  used 
to  discover  “anomalies”  with  the  other.  Present  NASA  validation  procedures  include  comparing 
outputs  from  the  batch  simulation  and  the  HIL  simulation  for  a  number  of  test  cases  that  specify 
the  test  flight  conditions  and  command  inputs.  Plots  of  the  two  simulations  are  compared  manually 
to  determine  if  significant  differences  exist  that  may  indicate  anomalous  or  unexpected  behavior. 
Manual  validation  of  the  HIL  model  against  the  batch  simulation  is  very  time  consuming.  Each 
anomaly  discovered  must  be  investigated  and  resolved  by  NASA  engineers  prior  to  flight  testing 
the  RFCS  on  the  F-18  HARV  aircraft.  This  manual  review  process  represents  a  central  bottleneck 
in  the  process  of  validating  a  new  RFCS  for  the  F-18  HARV,  often  precluding  additional  test  cases 
that  might  otherwise  prove  useful. 

It  is  noted  that  these  simulation  models  may  be  more  accurately  classified  as  deterministic  than 
stochastic,  even  though  there  are  stochastic  components  due  to  turbulence  and  other  noise  sources 
(e.g.,  EMI).  New  approaches,  well-founded  mathematically,  are  needed  to  address  such  problems 
and  would,  we  believe,  demonstrate  clear  performance  superiority  and  practicality  (from  the  user’s 
point  of  view)  not  only  over  the  manual  validation  approach,  but  also  over  such  “neural  network 
feature-based  approaches”  as  are  currently  being  pursued  by  some  (see,  e.g.,  [1]). 

As  suggested  by  the  latter  application  area,  a  potential  thrust  of  the  Phase  II  program  might 
be  to  focus  effort  on  the  validation  of  models  of  non-stochastic  systems,  such  as  continuous-state 
dynamic  systems,  which  are  described  by  differential  equations,  and  deterministic  discrete-event 
systems,  which  include  temporal  logic,  min-max  algebraic  models,  finite  state  machines,  and  Petri 
nets. 

Still  another  alternative  for  the  Phase  II  effort  might  be  to  address  simulation  model  validation 
problems  across  the  most  commercially-relevant  classes  of  models,  with  the  goal  of  creating  a 
“toolbox”  of  procedures  for  users  covering  many  types  of  simulation  models. 
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